n be seen that the F statistic should be maximised. This is because

ng ܵ

is equivalent to the maximisation of the F statistic and

ng ܵ

is equivalent to the maximisation of the F statistic as well.

ing different cluster numbers, different F statistic values will be

d. To find the best cluster model structure, it is therefore required

mising the F statistic. This means that the best cluster model

will correspond to the maximised F statistic for a specified

umber.

use it is not easy to determine a good threshold for the F statistic,

is used for the significance analysis. Figure 2.30(a) shows a toy

nine clusters. The distribution of p values is shown in Figure

where 15 models, which employed two, three, till 16 clusters,

ed. If the critical p value was 0.01, the selected model structure

t clusters because it was the first cluster structure with a p value

s than 0.01 in the p value distribution. It was one cluster less than

luster number.

(a) (b)

The K-means simulation for a data with nine clusters. (a) Visualisation of the

dels with seven, eight, nine and ten clusters. (b) The F statistic p value

of the K-means models constructed for the data.